In-Core Optimization of High-Order Stencil Computations

نویسندگان

Hikmet Dursun

Ken-ichi Nomura

Weiqiang Wang

Manaschai Kunaseth

Liu Peng

Richard Seymour

Rajiv K. Kalia

Aiichiro Nakano

Priya Vashishta

چکیده

In this paper, we apply in-core optimization techniques to high-order stencil computations, including: (1) cache blocking for efficient L2 cache use; (2) register blocking and data-level parallelism via single-instruction multipledata (SIMD) techniques to increase L1 cache efficiency; and (3) software prefetching techniques. Our generic approach is tested with a kernel extracted from a 6 th -order stencil based seismic wave propagation code on a suite of Intel Xeon architectures. Cache blocking and prefetching techniques are found to achieve modest performance improvement, whereas register blocking and SIMD implementation reduce L1 cache line miss dramatically accompanied by moderate decrease in L2 cache miss rate. Optimal register blocking sizes are determined through analysis of cache performance of the stencil kernel for different sizes of register blocks, thereby achieving over 4.3fold speedup on Intel Harpertown. We also examine lower precision (3 rd , 4 th , and 5 th orders) stencil computations to analyze the dependency of data-level parallel efficiency on the stencil order.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations

PATUS is a code generation and auto-tuning framework for stencil computations targeted at modern multiand many-core processors, such as multicore CPUs and graphics processing units. Its ultimate goals are to provide a means towards productivity and performance on current and future multiand many-core platforms. The framework generates the code for a compute kernel from a specification of the st...

متن کامل

Cache based optimization of stencil computations : an algorithmic approach

We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming more hierarchical. Clock frequency is no longer crucial for performance. The on-chip core count is doubling rapidly. The quest for performance is growing. These facts have lead to complex computer systems which bestow high demands on scientific computing problems to achieve high performance. Stenc...

متن کامل

A Generalized Framework for Auto-tuning Stencil Computations

This work introduces a generalized framework for automatically tuning stencil computations to achieve superior performance on a broad range of multicore architectures. Stencil (nearest-neighbor) based kernels constitute the core of many important scientific applications involving block-structured grids. Auto-tuning systems search over optimization strategies to find the combination of tunable p...

متن کامل

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

This paper presents a system for automatically supporting the optimization of stencil kernels on emerging Non-Uniform Memory Access(NUMA) many-core architectures, through a combined compiler + runtime approach. In particular, we use a pragma-driven compiler to recognize the special structures and optimization needs of stencil computations and thereby to automatically generate low-level code tha...

متن کامل

A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

Stencil computations are an integral part of applications in a number of scientific computing domains, such as image processing and partial differential equations. We describe a domain-specific language for regular stencil computations, that allows specification of the computations in a concise manner. We describe a multi-target compiler for this DSL, that generates optimized code for multi-cor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

In-Core Optimization of High-Order Stencil Computations

نویسندگان

چکیده

منابع مشابه

PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations

Cache based optimization of stencil computations : an algorithmic approach

A Generalized Framework for Auto-tuning Stencil Computations

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

عنوان ژورنال:

اشتراک گذاری